<- tar_read(count_matrix) count_matrix
3 Meat microbiome example
This document demonstrates a practical use case of differential analysis based on microbial functions identified in a small dataset. It uses the f3mr package and a pipeline managed by the targets
framework, then applies DESeq2 to detect enriched functions between two experimental conditions: AIR
and VPA
.
3.1 import count table
This section loads the count matrix produced by the targets pipeline. The matrix contains functional abundances of genes or metabolic categories across different samples.
3.2 create samples metadata
Here, a metadata table is created based on the column names of the count matrix. Each sample is annotated with a condition (AIR or VPA), which is automatically extracted from the sample name.
<- colnames(count_matrix) %>%
metadata tibble(sample = .) %>%
mutate(
condition = if_else(str_detect(sample, "AIR"), "AIR", "VPA") %>% as.factor()
)
metadata
3.3 Prepare DESeq2 dataset
A DESeqDataSet
object is created using the count matrix and the metadata. The model is defined with condition
as the variable of interest to detect differentially abundant functions between AIR and VPA conditions.
<- DESeqDataSetFromMatrix(
dds countData = count_matrix,
colData = metadata,
design = ~ condition
)
3.4 Run DESeq2 pipeline
This step runs the standard DESeq2 pipeline, which includes normalization, dispersion estimation, and hypothesis testing to identify significant functional changes between the two conditions.
<- DESeq(dds) dds
3.5 Extract differential expression results
The statistical results from DESeq2 are extracted, including the log2 fold changes and adjusted p-values (padj
) for each function or gene.
<- results(dds)
res
<- res %>%
res_tibble %>%
as.data.frame rownames_to_column(var = "gene") %>% as_tibble()
3.6 Vizualize results
A volcano plot is generated to visualize the results. Each point represents a function, plotted by log2 fold change (magnitude of change) and -log10 p-value (statistical significance). Functions with significant changes are highlighted in color.
# Volcano plot
<- res_tibble %>%
p ggplot(aes(x = log2FoldChange, y = -log10(pvalue), color = padj < 0.05)) +
geom_point(alpha = 0.8) +
theme_minimal() +
labs(title = "Volcano Plot", x = "Log2 Fold Change", y = "-Log10 P-Value")
p
3.7 significant results
The results are filtered to show only those functions with an adjusted p-value below 0.05, and sorted by the absolute log2 fold change. This helps highlight the most biologically meaningful differences.
<- res_tibble %>%
significant_genes filter(padj < 0.05)
%>%
significant_genes arrange(desc(abs(log2FoldChange)))
3.8 add an offset based on basal metabolism
The count matrix is converted again into a DESeqDataSet
, this time with the basal metabolism values included as an offset. This tells DESeq2 to account for inherent background signal when performing differential testing.
<- tar_read(basal_metabolism_matrix)
basal_metabolism_matrix
is.na(basal_metabolism_matrix)] <- 0
basal_metabolism_matrix[
basal_metabolism_matrix
The count matrix is converted again into a DESeqDataSet
, this time with the basal metabolism values included as an offset. This tells DESeq2 to account for inherent background signal when performing differential testing.
<- tar_read(count_matrix)
count_matrix
# Prepare DESeq2 data
<- DESeqDataSetFromMatrix(
dds countData = count_matrix,
colData = metadata,
design = ~ 1
)
assays(dds)[["offset"]] <- basal_metabolism_matrix
3.9 Run DESeq2 with offset
The model design is updated to include the condition variable again, and the DESeq2 analysis is re-run. This version of the test is more robust to background metabolic variation thanks to the offsets.
# Specify a new design that includes offsets
design(dds) <- ~ condition # Add any additional covariates as needed
# Run DESeq2
<- DESeq(dds)
dds
resultsNames(dds)
[1] "Intercept" "condition_VPA_vs_AIR"
The adjusted results are extracted from the DESeq2 object. These results now reflect differences between AIR and VPA that go beyond the expected background metabolic levels.
# Extract results
<- results(dds)
cap_results
<- cap_results %>%
cap_results_tibble %>%
as.data.frame rownames_to_column(var = "gene") %>% as_tibble()
An other volcano plot is created based on the adjusted results. Functions with an adjusted p-value below 0.001 are emphasized to highlight the most confidently differentially abundant functions after background correction.
# Volcano plot
<- cap_results_tibble %>%
p ggplot(aes(x = log2FoldChange, y = -log10(pvalue), color = padj < 0.001)) +
geom_point(alpha = 0.8) +
theme_minimal() +
labs(title = "Volcano Plot", x = "Log2 Fold Change", y = "-Log10 P-Value")
p
Finally, the significantly differentially abundant functions are filtered from the adjusted results and sorted by the absolute log2 fold change. This provides a refined list of candidate functions to investigate further.
<- cap_results_tibble %>%
significant_genes filter(padj < 0.001)
%>%
significant_genes arrange(desc(abs(log2FoldChange)))